Estimating Quantiles from the Union of Historical and Streaming Data
نویسندگان
چکیده
Modern enterprises generate huge amounts of streaming data, for example, micro-blog feeds, financial data, network monitoring and industrial application monitoring. While Data Stream Management Systems have proven successful in providing support for real-time alerting, many applications, such as network monitoring for intrusion detection and real-time bidding, require complex analytics over historical and real-time data over the data streams. We present a new method to process one of the most fundamental analytical primitives, quantile queries, on the union of historical and streaming data. Our method combines an index on historical data with a memory-efficient sketch on streaming data to answer quantile queries with accuracy-resource tradeoffs that are significantly better than current solutions that are based solely on disk-resident indexes or solely on streaming algorithms.
منابع مشابه
Frugal Streaming for Estimating Quantiles: One (or two) memory suffices
Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups, which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and int...
متن کاملEstimation of E(Y) from a Population with Known Quantiles
‎In this paper‎, ‎we consider the problem of estimating E(Y) based on a simple random sample when at least one of the population quantiles is known‎. ‎We propose a stratified estimator of E(Y)‎, ‎and show that it is strongly consistent‎. ‎We then establish the asymptotic normality of the suggested estimator‎, ‎and prove that it ...
متن کاملFrugal Streaming for Estimating Quantiles
Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any...
متن کاملEstimating Aggregate Properties on Probabilistic Streams
The probabilistic-stream model was introduced by Jayram et al. [16]. It is a generalization of the data stream model that is suited to handling \probabilistic" data where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over potentially a very large number of classical \deterministic" streams...
متن کاملSummarizing Two-Dimensional Data with Skyline-Based Statistical Descriptors
Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2016